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Abstract 

We combine an axiomatics of Renyi with the g-deformed version of Khinchin axioms to obtain a measure of information (i.e., 
entropy) which accounts both for systems with embedded self-similarity and non-extensivity. We show that the entropy thus 
obtained is uniquely solved in terms of a one-parameter family of information measures. The ensuing maximal-entropy distribution 
is phrased in terms of a special function known as the Lambert W-function. We analyze the corresponding “high” and “low- 
temperature” asymptotics and reveal a non-trivial structure of the parameter space. 
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1. Introduction 

In his 1948 paper OJ Shannon formulated the theory of data compression. The paper established a fundamental 
limit to lossless data compression and showed that this limit coincides with the information measure presently known 
as Shannon’s entropy 77. In words, it is possible to compress the source, in a lossless manner, with compression 
rate close to 77, it is mathematically impossible to do better than 77. However, many modern communication pro¬ 
cesses, including signals, images and coding/decoding systems, often operate in complex environments dominated by 
conditions that do not match the basic tenets of Shannon’s communication theory. For instance, buffer memory (or 
storage capacity) of a transmitting channel is often finite, coding can have a non-trivial cost function, codes might 
have variable-length codes, sources and channels may exhibit memory or losses, etc. Information theory offers vari¬ 
ous generalized (non-Shannonian) measures of information to deal with such cases. Among the most frequently used 
one can mention, e.g., Havrda-Charvat measure 0, Sharma-Mittal measure 0, Renyi’s measure 0] or Kapur’s 
measures |5|]. Information entropies get even more complex by considering communication systems with quantum 
channels B6j, |7]]. There exists even attempts to generalize Shannon’s measure of information in the direction where 
no use of the concept of probability is needed hence demonstrating that information is more primitive notion than 
probability 0]. 

In mid 1950 Jaynes 0 proposed the Maximum Entropy Principle (MaxEnt) as a general inference procedure that, 
among others, bears a direct relevance to statistical mechanics and thermodynamics. The conceptual frame of Jaynes’s 
MaxEnt is formed by Shannon’s communication theory with Shannon’s information measure as an inference func¬ 
tional. The central role of Shannon’s entropy as a tool for inductive inference (i.e., inference where new information 
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is given in terms of expected values) was further demonstrated in works of Faddeyev lITotl . Shore and Johnson & 
Wallis 111211 . Topspe Il3ll and others. In Jaynes’s procedure the laws of statistical mechanics can be viewed as infer¬ 
ences based entirely on prior information that is given in the form of expected values of energy, energy and number of 
particles, energy and volume, energy and angular momentum, etc., thus re-deriving the familiar canonical ensemble, 
grand-canonical ensemble, pressure ensemble, rotational ensemble, etc., respectively 0. Remarkable feature of this 
procedure is that it entirely dispenses with such traditional hypotheses as ergodicity or metric transitivity. Following 
Jaynes, one should view the MaxEnt distribution (or maximizer) as a distribution that is maximally noncommittal 
with regard to missing information and that agrees with all what is known about prior information, but expresses 
maximum uncertainty with respect to all other matters 0. By identifying the statistical sample space with the set 
of all (coarse-grained) microstates the corresponding maximizer yields the Shannon entropy that corresponds to the 
Gibbs entropy of statistical physics. 

Surprisingly, despite the aforementioned connection between information theory and physics and despite related 
advancements in non-Shannonian information theory, tendencies aiming at similar extensions of the Gibbs’s entropy 
paradigm started to penetrate into statistical physics only in the last two decades. This happened when evidence ac¬ 
cumulated showing that there are indeed many situations of practical interest requiring more “exotic” statistics which 
do not conform with Gibbsian exponential maximizers. Percolation, protein folding, critical phenomena, cosmic rays, 
turbulence, granular matter or stock market returns might provide examples. 

In attacking the problem of generalization of Gibbs’s entropy the information theoretic route to equilibrium statis¬ 
tical physics provides a very useful conceptual guide. The natural strategy that fits this framework would be then to 
revisit the axiomatic rules governing Shannon’s information measure and potential extensions translate into language 
of statistical physics. In fact, the usual axiomatics of Khinchin IU5II is prone to several “plausible” generalizations. 
Among those, the additivity of independent mean information is a natural axiom to attack. Along those lines, two 
fundamentally distinct generalization schemes have been pursued in the literature; one redefining the statistical mean 
and another generalizing the additivity rule. 

The first mentioned generalization was realized by Renyi by employing the most general means still compatible 
with Kolmogorov axioms of probability theory. These, so called, quasi-linear means were independently studied by 
Kolmogorov [16] and Nagumo lfl7l l. It was shown that the generalization based on quasi-linear means unambiguously 
leads to information measures known as Renyi entropies 0E1. Although, the status of Renyi entropies (RE’s) 
in statistical physics is still debated, they nevertheless provide an immensely important analyzing tool in classical 
statistical systems with a non-standard scaling behavior (e.g., fractals, multifractals, etc.) 119!. 120:1. 

On the other hand, the second approach generalizes the additivity prescription but keeps the usual linear mean. 
Currently popular generalization is the ^-additivity prescription and related ^-calculus fell[22[ |. The corresponding 
axiomatics 112311 provides the entropy known as Tsallis-Havrda-Charvat’s (THC) entropy]. As the classical additiv¬ 
ity of independent information is destroyed in this case, a new more exotic physical mechanisms must be sought to 
comply with THC predictions. Recent theoretical advances in systems with long-range interaction s 126 ], in general¬ 
ized (and specifically ^-generalised) central limit theorems [27], in theory of asymptotic scaling [28], etc., indicate 
that the typical playground for THC entropy should be in cases where two statistically independent systems have 
non-vanishing long-range/time correlations or where the notion of statistical independence is an ill-defined concept. 
Examples include, long-range Ising models, gravitational systems, statistical systems with quantum non-locality, etc. 

It is clear that an appropriate combination of the above generalizations could provide a new conceptual paradigm 
suitable for a statistical description of systems possessing both self-similarity and non-locality. Such systems are quite 
pertinent with examples spanning from the early universe cosmological phase transitions to currently much studied 
quantum phase transitions (frustrated spin systems, Fermi liquids, etc.). In passing we should mention that there exists 
a number of works trying to compare both Renyi and THC entropies from both the theoretical and observational point 
of view (see, e.g. Refs. 1291130 11). Nevertheless, the merger of both entropic paradigms has not been studied yet. It is 
aim of this paper to pursue this line of reasonings and explore the resulting implications. In order to set a stage for 
our considerations we review in the following section some axiomatic essentials for both Shannon, Renyi and THC 
entropies that will be needed in the main body of the paper. In Section [3] we then formulate a new axiomatics which 


'Other important approaches such as Kaniadakis's f24fl and Naudts’s fijjl deformed Hartley’s logarithmic information also utilize linear means 
and generalized additivity rule (e.g., v-additivity) but as yet they still lack the information-theoretic axiomatics that is crucial in our reasonings. For 
this reason we exclude these works from our consideration. 
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aims at bridging the Renyi and THC entropies. It is found that such axiomatics allows for only one one-parametric 
family of solutions. Basic properties of the new entropy that we denote as D q are discussed. A simplification that D q 
undergoes in multifractal systems is particularly emphasized. The corresponding MaxEnt distributions are calculated 
in Section [4] We utilize both linear and non-linear moment constraints (applied to the energy) to achieve this goal. 
In both aforementioned cases the distributions are expressible through the Lambert W-function. Since the analytic 
structure of MaxEnt distributions is too complex we confine our analysis to the corresponding “high” and “low- 
temperature” asymptotics and discuss the ensuing non-trivial structure of the parameter space. In Section[5]we discuss 
the concavity and Schur-concavity of D q . Section[6]is devoted to conclusions. The paper is substituted with three 
appendices which clarify some finer mathematical points. 


2. Brief review of entropy axiomatics 

The information measure, or simply entropy, is supposed to represent the measure or degree of uncertainty or 
expectation in conveyed information which is going to be removed by the recipient. As a rule in information theory 
the exact value of entropy depends only on the information source — more specifically, on the statistical nature of 
the source. Generally speaking, the higher is the information measure the higher is the ignorance about the system 
(source) and thus more information will be uncovered after the message is received (or an actual measurement is 
performed). As often happens, this simple scenario is not frequently tenable as various restrictive factors are present 
in realistic situations; finite buffer capacity, global patterns in messages, topologically non-trivial sample spaces, etc.. 
One may even entertain various information theoretic implications related with the quantum probability calculus or 
quantum communication channels. Thus, as we go to somewhat more elaborate and realistic models, the entropy 
prescriptions get more complicated and realistic! 

To see why a new generalization of the entropy is desirable let us briefly dwell into 3 most common entropy 
protagonist, namely Shannon’s, Renyi’s and THC entropy. 


2.1. Shannon's entropy — Khinchin axioms 


The best known and widely used information measure is Shannon’s entropy. For the completeness sake we now 
briefly recapitulate the Khinchin axiomatics as this will prove important in what follows. It consist of four axioms 11511 : 


1. For a given integer n and given P = {p\,pi, ■ ■ ■,p„} (Pk > 0, Yll Pk = 1)’ is a continuous with respect to 

all its arguments. 

2. For a given integer n, < H(p\,p 2 , ■ ■■, p n ) takes its largest value for pk = l/n(k- 1,2,...,«). 

3. For a given q e R; 1~{(A U B) = r H(A) + 91(B\A) with 91(B\A) = YjkPk *H(B|A - A*), and distribution P 
corresponds to the experiment A. 

4. < H(pi,p 2 , • ■ •, p n , 0) = 7Y(/?1, pi, ■ ■ ■, p n ), i.e., adding an event of probability zero (impossible event) we do not 
gain any new information. 

The corresponding information measure. Shannon’s entropy, then reads (up to the normalization constant 


< HPP) = -'YjPk In Pk- 

k= 1 


( 1 ) 


In passing we should stress two important points. Firstly, 3rd axiom (known as separability or strong additivity axiom) 
indicates that Shannon’s entropy of two independent experiments (sources) is additive. Secondly, there is an intimate 
connection between the Boltzmann-Gibbs entropy and Shannon’s entropy. In fact, thermodynamics can be viewed as 
a specific application of Shannon’s information theory: the thermodynamic entropy may be interpreted (when rescaled 
to “bit” units) as the amount of Shannon information needed to define the detailed microscopic state of the system, 
which remains “uncommunicated” by a description that is solely in terms of thermodynamic state variables. 


2 The normalization influences the base of the logarithm. In information theory, it is common to choose normalization "7 1(\, \) = 1, leading to 
binary logarithms. We adopt physical conventions and in the whole text use the normalization leading to natural logarithms. 
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2.2. Reny’s entropy: entropy of multifractal systems 

As already mentioned, RE represents a step further towards more realistic situations encountered in information 
theory. Among a myriad of information measures, RE’s distinguish themselves by firm operational characterizations. 
These were established by Arikan 13111 for the theory of guessing, by Jelinek 113211 for the buffer overflow problem 
in lossless source coding, by Cambell I[33] for the lossless variable-length coding problem with an exponential cost 
constraint, etc. Recently, an interesting operational characterization of RE was provided by Csiszar [.341] in terms of 
block coding and hypotheses testing. In the latter case the Renyi parameter q was directly related to so-called /3-cutoff 
rates |0]- 

Apart from information theory RE’s have proved to be an indispensable tool also in numerous branches of physics. 
Typical examples are provided by chaotic dynamical systems and multifractal statistical systems (see e.g., II3511 and 
citations therein). Fully developed turbulence, earthquake analysis and generalized dimensions of strange attractors 
provide examples. 

RE of order q (q > 0) of a discrete distribution P = {p \,..., p n } are defined as 


Ia(P) = 


1 


(1 ~q) 


■In 


2 >>» 


\k= 1 


( 2 ) 


In his original work, Renyi 0 J_8 ] introduced a one-parameter family of information measures (O which he based on 
axiomatic considerations. In the course of time these axioms have been sharpened by Darotzv |36ll and others |[37;1. 
Most recently it was shown that RE can be uniquely derived from the following set of axioms 13511 : 

1. For a given integer n and given P = \p\, pi ,..., p„\ (pk > 0, Pk = 1 ), HP) is a continuous with respect to 

all its arguments. 

2. For a given integer n, I(p\,p 2 , ■ ■ ■,Pn ) takes its largest value for pk = 1 /n (k - 1,2,. .., n). 

3. For a given ? e 1; J(A U B) — 1(A) + I(B\A) with I(B\A) = g~ l {YjkQk(q)g(I(B\A = A*))), and Qk(q) = 

p'lJ Yjk Pk (distribution P corresponds to the experiment A). Here g is invertible and positive in [0, oo). 

4 . I(pi,p 2 ,...,p n ,0) = I(pup 2 ,...,p n ). 


Former axioms markedly differ from those utilized in |4llaQa,[37]. Particularly distinctive is the presence of the 
escort (or zooming) distribution g(q) in the 3rd axiom. Distribution p(q) was originally introduced by Renyi 0] 
to define the entropy associated with the joint distribution. Quite independently was g(q) introduced by Beck and 
Schlogl lf3^1 in the context of non-linear dynamics. 

We briefly remind some elementary properties of I q : it is symmetric in all arguments, for q < 1 is I q a concave 
function and 'H(P) < I q (P), while for q > 1 it is neither concave nor convex and I q (P) < 'H(P). On the other 
hand, RE of any order are Schur-concave functions 138j|■ In fact, every function f(P) which is Schur concave can 
represent a reasonable measure of information, since it is maximized by a uniform probability distribution, while 
minimum is provided with concentrated distributions P = [p, — 1 ,Pj& = 0). Some further properties can be found, 
e.g., in Refs. BtHHH. 

Note particularly that RE of two independent experiments (sources) is additive. In fact, it was proved in Ref. 0 
that RE is the most general information measure compatible with additivity of independent information and Kol¬ 
mogorov axioms of probability theory. 


2.3. THC entropy: entropy of long distance correlated systems 

THC entropy was originally introduced in 1967 by Havrda and Charvat in the context of information theory of 
computerized systems 0 and together with the a-norm entropy measure 0 it belongs to class of pseudo-additive 
entropies. In contrast with Renyi’s or Shannon’s entropy THC entropy does not have (as yet) an operational charac¬ 
terization. Havrda-Charvat structural entropy, though quite well known among information theorists, had remained 
largely unknown in physics community. It took more than two decades till Tsallis in his pioneering work Eh on gen¬ 
eralized (or non-extensive) statistics rediscovered this entropy. Since then THC entropy has been employed in many 
physical systems. In this connection one may particularly mention, Hamiltonian systems with long-range interactions, 
granular systems, complex networks, stock market returns, etc.. For recent review see, e.g.. Ref. [42]. 

In the case of a discrete distribution P = {p\,... ,/?„} the THC entropy takes the form: 
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s c m = 


(1-9) 


J](Pkf -1 

k=\ 


, q > 0 . 


(3) 


Various axiomatic treatments of THC entropy were proposed in the literature. For our purpose the most convenient 
set of axioms is the following [23}]: 


1. For a given integer n and given *P - {pi, P 2 , ■ ■ ■, Pn) (Pk > 0 ,Y!'kPk - 1). S(P) is a continuous with respect to 
all its arguments. 

2. For a given integer n, S(p\,p 2 , ..., Pn) takes its largest value for pk = 1 /n(k- 1,2 

3. For a given q eR; 5(A U B) - 5(A) + 5(B|A) + (1 - q)S{A)S(B\A) with 

S(B\A) = Z k Q k (q)S(B\A=A k ), 

and Qk{q) — p q k / Hr pi (distribution V corresponds to the experiment A). 

4. S(pi,p2,..-,Pn,0) =S{pup 2 ,...,Pn)- 


As we said before, one keeps here the linear mean but generalizes the additivity law. In fact, the additivity law in 
axiom 3 is nothing but the Jackson sum known from the ^-calculus 114311 : there one defines the Jackson basic number 
[Jf]( 9 j of quantity X as 


[X] {q) = (q X - l)/( 9 - 1) =» [X + Y] lq) - [X] k} + [Y] k] + (q - l)[X] { „[y], 9) . 


(4) 


The connection with axiom 3 is then established when q —> (2 - q). Nice feature of the < 7 -calculus is that it formalizes 
many mathematical manipulations. For instance, using the ^-logarithm 


ln (9l x 



1 

1 -q 


(x l 2 ~ q - 1 ), 


THC entropy can be concisely written as the ^-deformed Shannon’s entropy, i.e.. 


(5) 


SqOP) = ~ Pk In 


(2-?} Pk 


k= 1 


YjPk in {<i) pk = Yu pkln 




k= 1 


k=\ 


( 6 ) 


Some elementary properties of S q are positivity, concavity (and Schur concavity) for all values of q and indeed non- 
extensivity. There hold also inequalities between all three entropies, namely: 


W) < I q (P) < S q (V ), 


forO < q < 1, and 


S q (P) < I q (P) < 


for q > 1. For a monograph that cover this subject in more depth the reader is referred to Ref. [44]. 


(7) 

( 8 ) 


3. J-A axioms and solutions 

It would be conceptually desirable to have_a unifying axiomatic framework in which both properties of Renyi and 
THC entropies are both represented. In Ref. [[56] one of us proposed the following natural synthesis of the previous 
two axiomatics: 

1. For a given integer n and given V = {p\, p 2 , ■ ■ ■, p?,} (pk > 0, Yll_ Pk - 1), 'DIV) is a continuous with respect to 
all its arguments. 

2. For a given integer n, IXp \, p 2 ,..., p„) takes its largest value for pi- = l/n (k = 1,2,..., n). 
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Hybrid entropy Tsallis entropy Renyi entropy 

D(p) S(p) I(p) 



Figure 1. Comparison of entropies for several values of q for two-event systems (P = {p, 1 - /?}). The dashed curve represents the hybrid entropy 
D 04 which violates the maximality axiom. 


3. For a given q e R; D(A UB) = D(A) + D(B\A ) + (1 - q)D(A)D(B\A) with 

D(B\A) = /-' (j: k g k (q)f(D(B\A = A k ))), 

and Qkiq) — p q k / 2* p‘l (distribution 'P corresponds to the experiment A). Function / is invertible and positive 
in [0, oo). 

4. D(puP2, ■ ■ • , Pn, 0) = D(pi,p 2 , ■■■, Pn)- 


Note particularly that due to the non-linear nature of the non-additivity condition there is no need to select a 


ization condition for D q . In Ref. [5q| it was shown that above axioms allow for only one class of solutions, 
to an entirely new family of physically conceivable entropy functions. For reader’s convenience are the basic 
the proof sketched in Appendix A In particular, the resulting hybrid entropy has the following form: 


normal¬ 
leading 
steps of 


T> q (A) 


1 

Y^~q 


g -(l -qfdl q ldq 


n 


\ 


" 1 

k=l / 




<ln n, 


(9) 


Let us further remark that axiom 4 restricts the possible values of q to q > 0. This is because 'D q would otherwise 
tend to infinity if some of p k would tend to zero. The latter would be counter-intuitive, because without changing 
the probability distribution we would gain an infinite information. Value q - 0 must be also ruled out on the basis 
of axiom 2, because IX, would yield an expression not dependent on the probability distribution V but only on the 
number of outcomes (or events) — i.e.. Do would be a system (source) insensitive. In addition, by further analysis 
in Appendix A supported by the concept of Schur-concavity in Section [5] we show that D q is well-defined only for 
q > -• In particular, for q < 2 the entropy D q has a local minimum at P = {1/n,..., 1/n} (rather than maximum) and 


therefore it does not fulfill axiom 2. Some basic properties of the hybrid entropy D q are presented in Appendix B 


Before studying further implications of the formula ([9|. there are two immediate consequences which warrant 
special mention. The first is that, from the condition dl q /dq < 0 (see Section [.Oi l we have 


D (A) -J * S 3 A ) if 9 ^ 1 
< Sq {A) ifq>\ 


( 10 ) 


where equality holds, if and only if, q = 1 or dl q /dq = 0. These mean that either D q (A) and S q (A) jointly coincide 
with Shannon’s entropy or that V is uniform or {1,0,..., 0). Hence, combining this with inequalities between THC, 
Renyi end Shannon entropy, we obtain 


0 < W) < I q {V) < S q (V) < D q {V) < ln {9) n for i < q < 1, 
0 < D q OP) < S q (P) < I q (V) < 'H(P) < In n for q > 1 . 


( 11 ) 


The result (ITT1) implies that by investigating the information measure D q with q < 1 we receive more information 
than restricting our investigation just to entropies I q or S q . On the other hand, when q > 1 then both I q and S q are 
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more informative than O q . The first set of inequalities is also valid for q < but the last relation to ln^) n is not true 
for the hybrid entropy. The practical illustration of the above inequalities can be seen in Fig.|T|for simple distribution 


*P = {p, \ - p). 

In practical cases one usually requires more than one q to gain more complete information about the system. In 
fact, when entropies I q or S„ are used, it is necessary to know them for all q in order to obtain a full information 
on a given statistical system [35|. For ensuing applications in strange attractors the reader may consult Ref. [46!], for 
reconstruction theorems see, e.g.. Refs. |4, 35i]. 

The second comment to be made concerns the fact that when the statistical system in question is a multifractaO 
then relations (lC.2b - (lC.6b assert that 


(1 -qf^j 1 = (a-f(a))\ns = In 
dq 


(N(a) 


'YjPkie) 

*. k 


( 12 ) 


where summation runs only over support boxes of size s with the scaling exponent a. Alternatively, we could have 
started with the first relation in Eq. (1A. 1 9b and use the multifractal canonical relations (see Ref. [350) in which case 
the result would have been again (IT2l i. So for the coarse-grained multifractal with the mesh size s the corresponding 
entropy D q reads 


r> q (A) = 


i 


(1-9) 


HUPkicW 

Ef <n Ws) 


(13) 


Now, the passage from multifractals to single-dimensional statistical systems is done by assuming that the a-interval 
is infinitesimally narrow and that PDF is smooth [33, 471. In such a case Cvitanovic’s condition [47] holds, namely 
both a and f{a) collapse to a — f(a ) = D and q - f'{a) = 1. So, for example, for a statistical system with a smooth 
PDF and the support space W 1 the relation ( [T3l > implies that the entropy D q coincides with Shannon’s 77. In this 
connection it is important to stress that the similarity of (IT3l) with THC entropy is only apparent. In order to have THC 
entropy one needs to have N(a ) = n, i.e., the entire probability measure must be accumulated around the unifractal 
with the scaling exponent a. According to the Billingsley (or curdling) theorem (-48}, 49] this is possible only when 
a = f(a) - D, i.e., only when D q - 77. As a byproduct of Eq. (ITTI) we may notice that for single-dimensional systems 


with smooth PDF’s S q and I q must approach Shannon’s entropy 


We remark that this may help to understand 


why Shannon’s entropy plays such a predominant role in physics of single-dimensional sets. 

In what follows, we examine the class of distributions that represent maximizers for D q (A) subject to constraint 
imposed by the average value of energy. 


4. MaxEnt distribution 

According to information theory, the MaxEnt principle yields distributions which reflect least bias and maximum 
uncertainty about information not provided to a recipient (i.e., observer). Important feature of the usual Gibbsian 
MaxEnt formalism is that the maximal value of entropy is a concave function of the values of the prescribed constraints 
(moments), and maximizing probabilities are all grater than zero |j50ll . The first is important for thermodynamical 
stability and the second for mathematical consistency. In this section we will see that both mentioned features hold 
true also in the case of the T) q entropy. 

Let us first address the issue of maximizers for D q . To this end we shall seek the conditional extremum of D q 
subject to the constraints imposed by the averaged value of energy E (or generally any random quantity representing 
the constant of the motion) in the form 


<■ E )r = ^Q k (r)E k . 
k 


( 14 ) 


3 The necessary essentials on multifractals are presented in |Appendix C| 
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For the future convenience we initially keep r not necessary coincident with q. Taking into account the normalization 
condition for p t we ought to extremize the functional 


LgA'P) 


£> q {P) - a 


T,k(PkYE k 

ZkiPkY 



(15) 


with Q and <I> being the Lagrange multipliers. Setting the derivatives of L qr (P) with respect to pi,p 2 ■ ■ •, etc., to zero, 
we obtain 


dPi L J ZkiPkY 

- -® = 0, i = l,2,...,n. (16) 

Zk(Pk)' 

Note that when both q and r approach 1. ( IT6l) reduces to the usual condition for Shannon’s maximizer. This, in turn, 
ensures that in the (q, r) —* (1,1) limit the maximizer of dTH is Gibbs’s canonical-ensemble distribution. Let us now 
concentrate on the two most relevant situations, namely when r — q and r = 1. 


4.1. The r — q case 

When we decide to use r - q (i.e., when the non-linear moment constraints are implemented via escort distribu¬ 
tion) it follows from (fl6l ) that 


= e (q - mn ^ [ 9 (<lnP>, - In p,-) - l] - ?£!(£,■ - (E) q ). 


(17) 


Multiplying both sides of G2J by Qi(q), summing over i and taking the normalization condition p( : - 1 we obtain 


<p = _ e («-D<inf>> ? 


ln(-O) 

q -1 


= <lnf% 


mv)\ 


q\> )Imax 


q -1 


(® + i). 


Plugging result (IT8l) back into (IT7l) we obtain after some algebra 




which must be true for any index i. On the substitution 

q- 1 O 

this leads to the equation 

K(Pi) l ~ q = q in Pi + £/■ 

Here we have denoted Zk(Pk) q = K - Equation (OH has the solution 


q 


K(q - 1) 


W 


1 1 Aq-m/q 


q 


1/(1-9) 


exp j 


[ w( 


gi.q-m/q'j 


(q- 1) 


■Si/9 


(18) 


(19) 


( 20 ) 


( 21 ) 


( 22 ) 


with W(x) being the Lambert-W function 0. 

A couple of comments are now in order. First, p,’s as prescribed by (l22t are positive for any value of q > 0. This 
is a straightforward consequence of the following two identities 0: 


W( x) 
W{x) 


°° /- iyi-l„n-2 

h ("-D! ’ 

xe~ w(x) . 

8 


(23) 


( 24 ) 
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Indeed, Eq. (|23T> ensures that for x < 0 also W(x) < 0 and hence W(x)/x > 0. Thus for ()<£/< 1 the positivity of p,’s 
is a simple consequence of the first part of (l22l i. Positivity for q > 1 follows directly from the relation ( 1241 and the 
second part of (l22l >. 

Second, as q —* 1 the entropy D q —* 77 and we expect that /?,■’s defined by (l22l> should approach the Gibbs 
canonical-ensemble distribution in this limit. To see that this is indeed the case, let us note that 


®| ?=1 = -1, Si\ q=i = 1 + ?7 + Q(£,-<£», and K \ q=1 = 1. 


(25) 


Then 


Pi \ q=l = e i-H+?W.-<£»] = f-oe, = e -W /z , (26) 

(here F is the Helmholtz free energy) which after identification D| r/=1 = [4 leads to the desired result. Note also that 
(l22l > is invariant under uniform translation of the energy spectrum, i.e., the corresponding p, is independent of the 
choice of the energy origin. 

Third, there are situations, when Eq. (I2TT > has no solution, or it gives solution for /;, 7 [0,1]. To see this, we may 
notice that when q > 1, the left-hand side of (l2TT > is greater than k, from which follows that £, > k for all i’s. For 
q < 1 the left-hand side of (ITH ) acquires values from [0, k\ which (after using the fact that q < \ < k) leads again 
to the condition £, > k. In both cases are therefore £, positive. Thus, for energies, for which A q E, - F, - {E) q is 
too negative, Eq. (OTt has no solution, and the corresponding occupation probability is zero. Contrary to MaxEnt 
distributions of other commonly used entropies, there exist energy levels here, for which MaxEnt distributions of 
D q have zero occupation probabilities. This might provide a natural conceptual playground for statistical systems 
with energy gaps (e.g., disordered systems, carbon nanotubes) or for system with various super-selection rules (e.g., 
first-quantized relativistic systems). 

Finally, there does not seem to by any simple method for a unique determination of <t> and Q from the constraint 
conditional In fact, only asymptotic situations for large and vanishingly small Q. can be successfully resolved (this 
will be relegated to Sections [4. 1.2l and 14.1.31 ). There exists, however, systems of a practical interest — namely multi¬ 
fractal systems, where we can give to relations (l22l > a very satisfactory physical interpretation, without resolving (E) q 
in terms of ® and SI. 


4.1.1. Multifractal case 

In case when a statistical system under investigation fits the multifractal paradigtrd, we can cast Eq. (OH in the 
form 

g r( 9 )+ fli (i-<jO _ 2 + gja,. _( a ) 9 ( e )j|i + pjine, (27) 

where r and a,- are correlation exponent and Lipshitz-Holder exponent, respectively. Note that the g-mean (a) q (s) at 
the coarse-grained scale e is proportional to the g-mean of log-PDF, namely 

<ln P)q = ^ Qk(q) In Pk ~ ^Qkiq^klns = (a) q (s)\ns. ( 28 ) 

k k 


So, in particular, '[> 




as can be directly deduced from Eq. (ITSt . 


Equation (OH has several important implications. Firstly, we remind the reader that in the long-wave limit (i.e., 
when s —» 0), one can use analogy with ordinary statistical thermodynamics and interpret (a) q as the most likely value 
of “energy” of a system immersed in a heat bath with the effective inverse temperature f> = q (see, e.g.. Ref. 13511). 
This is a version of the Billingsley (or curdling) theorem 148], 4^, 68 ], which states that the Hausdorff dimension of the 
set on which the escort probability Qkiq) is concentrated is f((a) q ) = q{a) q - r(q). In addition, the relative probability 
of the complement set approaches zero when e —> 0. This in turn means that for each q there exists one scaling 


4 In conventional statistical physics one does not solve £1 in terms of averaged energy (i.e., internal energy U) since £2 can be identified with 
inverse temperature which is much more fundamental quantity than U. In fact, it is U that is typically given as a function of £2. 

5 For a brief introduction to multifractals see|Appendix C| 
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exponent, namely a,- = (a) q which dominates, e.g., the partition function k, whereas p,'s with other Lipshitz-Holder 
exponents have only marginal contribution. 

Note that the aforesaid indeed mimics the situation occurring in equilibrium statistical physics. There, in the 
canonical formalism one works with (usually infinite) ensemble of identical systems with all possible energy config¬ 
urations. But only the configurations with £) ~ {E)p dominate partition function in the thermodynamic limit. Choice 
of temperature T — 1//3 then prescribes the contributing energy configurations. 

Secondly, for small s we have 


r(q) + a,(] - q) ~ lnjl + ^ [a, - (a) 9 (e)] |l + ^jlnej/lne. 


The right-hand side is non-trivial only when 




V- In; 


(29) 


(30) 


[note that | a, - {a) q \ ~ 1 / V- In see Appendix 


C]. In such a case Eq. ( l29b can be recast in the form 


T(q) + cii( 1 -q) ~ q 


1 + ^ j \ai ~ <a> ? (e)] , 


(31) 


implying that fi/|<f>| = (2 q - 1 )/q. With the help of (l30b this means that q e [1 — 1/ V- In e, 1 + 1/ V-ln e]. Bearing 
this in mind we cab write the single-cell probability /;, ~ eP‘ as 

Pi ~ [l + (1 -q)(cii - <a) 9 )lne] 1/(1 . (32) 

In multifractals it is more customary to consider the total probability of a phenomenon with a scaling exponent a,-, 
i.e., Pj(a) ~ . To this end we can first utilize a simple quadratic expansion 

/(«,)-/((«),) - q(a, - {a) q ) + ^/"((«),)(«,- {a) q f + ■•• = q(a, - (a) q ) + + ■•• . (33) 

In the last equality we have employed Eqs. (lC.7b - (lC.8b . Note also that the higher-order terms in the expansion (l33l > 
are of the order 0((- In i:) 3,/2 ). From ( 1271 ) and (1331) we then get 


P) 1 q \a) oc 1 + q [a,- - <a) 9 J j 1 


Ine 


(1 - q)q [a, - <a> 9 ] In e - (1 - q) 


1 (cii - (a) q ) 2 

2 (Aa) 2 


(34) 


Since for values a, close to (a) q the distribution I 1 , must acquire (due to curdling theorem) a non-trivial value in the 
limit s —» 0, the logarithmic divergences in (l34b must cancel each other, yielding the simple condition Q = ry| cD| . 
With this we can finally write 


Pj OC 


1 - (1 ~q) 


(a,- - (a) q ) 2 
2(A a) 1 


!/(!--?) 


(35) 


This distribution is encountered in a number of multifractal systems. A paradigmatic example can be found in a 
statistical description of the intermittent evolution of fully-developed turbulence. In such a case P,(a) describes the 
distribution of singularity exponents of the velocity gradient m. In addition, the parameter q satisfies the scaling 
relation 


l/d-9) = 1/a- - l/a + . 


(36) 


where a± are defined by f(a±) = 0. Such a scaling is a manifestation of the mixing property. In Ref. (61] it was further 
shown that the q variance (Ac/) 2 can be related to the phenomenologically important intermittency exponent p. 
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4.1.2. “High-temperature” expansion 

Let us now make an important remark concerning the asymptotic behavior of p, in regard to El. If we assume that 
the Lagrange multiplier Q« 1 then from (1241 the following expansion holds 



K(q ~ 1) 

<&q 



. El 

exp (~(q - !)-A 9 L, 


W(x) fl - (1 - q)El*A q Ei] , 


with 

A q E { = Ei - (E) q , El* = 
Hence, if we use the relation (l22l we can write 


El 

4>(W(jc) + 1) ’ 


x 


K(q ~ 1) 
<S>q 



r nl/(l-?) 

[l - (l- 9 )£2* A q E] 

r 1 1 / ( q — l) 

2*[l - (1 - q)El*A q E k ] 


- Z‘[ 1 - (l- 9 )n*A,£/] I/(1_,) 


(37) 


(38) 


(39) 


with 


Z = Z f 1 " (1 ~ J 




J 

[K(q- 1) 


1/(?-D 


(40) 


The distribution (l39l agrees with the so called 3rd version of thermostatics introduced by Tsallis et al. [^52J. It might 
by also formally identified with the maximizer for Renyi’s entropy l!58ll. Clearly, El* is not a Lagrange multiplier, but 
El* passes to yd at q —» 1 (in fact, <J) —> —1, El —> [3 and W(x) —» 0 at q —» 1). Note also that when El = 0 (i.e., no energy 
constraint) then p, = 1 /« which reconfirms the fact that D q attains its largest value for the uniform distribution. 


4.1.3. “Low-temperature” expansion 

From the physical standpoint it is the asymptotic behavior at Q » 1 (or more precisely at El\(q - 1 )/<l>| » 1), 
i.e., “low-temperature” expansion, that is most intriguing. This is because the branching properties of the Lambert-W 
function at negative argument values make the structure of P rather non-trivial. We thus split our task into four distinct 
cases: 


«i) (q - i) > 0 

and 

A q E < 0 , 

a 2 ) (q - 1) > 0 

and 

A q E > 0 , 

bi) (q- 1)<0 

and 

AqE <0, 

b 2 ) (q - 1) < 0 

and 

AqE >0. 


Cases a\) and ai) are much simpler to start with as the argument of W is positive. W is then a real and single valued 
function which belongs to the principal branch of Wo, see Fig]2] When A q E < 0 then a \) implies the asymptotic 
expansion 


with 


W(z ) 


Pi 


| 0 | 




e- 1/9 exp 


- z r lex p(-|| A ^ 


1 \ 1/(<? - 

l®lj 


(41) 


(42) 


Note that in this case p, is of a Boltzmann type ((E) q can be canceled against the same term in Z\). 
On the other hand, 02 ) situation implies the asymptotic expansion 15111 


W{z) * ln(z) - ln(ln(z)) =» p, = Z^ 1 [l - (1 - q)El*A q Ei] U ^ q) 

11 


( 43 ) 
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Figure 2. Two real branches of the Lambert-W function. Solid line: Hn(r) = W(x) defined for -l/e < x < +oo (so far considered). Dashed line: 
W-i(x) defined for -l/e < x < 0. The two branches meet at point (- l/e , -1). 


P 
1.4 

1.2 

I 

" "4 ^ ' 

0.8 

0.6 
0.4 
0.2 

--‘-*- E- <E>q 

-0.4 -0.2 0.2 0.4 

Figure 3. A plot of the “low-temperature” MaxEnt distribution 1111 (43). The parameters of the plot are chosen in the following way: k = 0.01, 
O = -0.68, q = 30 and Q. = 0.5. The distribution is normalized to 1 on the interval A q E € [-0.5,0.5]. 




with 


Z 2 = 


q , I K (q- 1) lq - 1 

;Wn5ir esp 


l/(9-l) 


; Q* = — 

m 


. i x(q - 1) q- 1 

ln * —- ex P 


(44) 


Although the distribution (l43l i formally agrees with Tsallis et al. distribution it cannot be identified with it as O' does 
not tend to ft in q —> 1 limit. In fact, the limit q —> 1 is prohibited in this case as it violates the “low-temperature” 
condition 0|(^- 1)/<D| » 1 . Note particularly that our MaxEnt distribution represents in the “low-temperature” regime 
a heavy tailed distribution with Boltzmannian outset. When O and q > 1 are fixed one may find k and <I> from the 
normalization condition 


Zr 1 J] exp(--^-A^) +Z4 1 J] [l- (1-4)0* A q E k ]' /(1 ' q) = 1, (45) 

k\A q E k <0 \ I I / k\A q E k > 0 


and sewing condition at A q E = 0. However, because the “low-temperature” approximation does not allow to probe 
regions with small A q E one must numerically optimize the sewing by interpolating the forbidden parts of A q E axis. 
Example of such a numerical optimization is presented in Fig. [3] 

Cases bi) and bo) are technically more involved, because q < 1 causes that the argument of W(- ■ ■) is negative. In 
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case b\) we obtain for low temperatures that 

^ ^ exp |-(<? - 1)“^) -> -oo . (46) 

Nevertheless, the complex Lambert-W function has a branch cut in the interval [-oo,—1/e], so the real-valued 
Lambert-W function is defined only for x > -1/e and Eq. ( 1241 ) has no real solution. This situation corresponds 
previous discussions about existence of solution of Eq. ED. 

In case bz) there exist two solutions of Eq. ( 1241 ). i.e. Wq(z) and W_i(z) (see Fig.0. In case of the principal branch 
VV'o(z), the Lambert W-function can be approximated as W(z) ~ z, and the solution corresponds to the case a\). In 
case of the principal branch W-\(z), the asymptotic expansion for z —> 0 is 

W-i(z) ~ ln(-z) - ln(- ln(-z)) (47) 


so the resulting probability is similar to the case ai), only with 

(48) 

We should stress that for all cases it is necessary to check the validity of the asymptotic expansion and its appli¬ 
cability to the MaxEnt distribution. In some cases can the expansion violate the condition p, < 1 and then it is not 
possible to use such approximations. 


Q 

£ 2 * = — 
m 


.K(l-q) q- 1 

In | —--exp 1 


4.2. The r — 1 case 


When r — 1 is chosen (i.e., when the constraints are implemented via the usual linear averaging) then Eq. ( 1 1 61 
implies 


0) = e (?- 1) < lnP >« y q ({\nV) q - In p-) - l] 


(Pi) 9 - 1 

Zk(Pk) q 


WE, - (E)). 


Multiplying by p ,• and summing over i we obtain the constraint 


(p - P>« 


ln(-O) 

q -1 


- (In? 5 ),. 


Upon insertion of (l50l ) into (l49l) we get a transcendental equation for p„ which reads 


k P 


1-9 


cD 


, q ln(—<D) , , 

q In Pi -— + 1 

q- 1 


cD + £l(Ei - (E)) 

The solution can be again written in terms of the Lambert W-function, namely 

qi D 


Pi = 


(q - l)/f(cD + ClAEi) 

1 

(_0)l/U-?)gl/9 CXP \(q- 1) 




1 


(D q 

w{ 


q 


£2 

1 + —AE, 
cD 


i/(i— 9 ) 


K(q~ 1) q~ 1 

-T- ex P 

<D q 


O 

1 + 0 AE ' 


(49) 


(50) 


(51) 


(52) 


Relations (l23l > nad (l24l) again ensure that all p,’s are positive. In addition, it is easy to check that in the limit case 
q —> 1, the formula (l52t approaches the classical Gibbsian maximizer. Indeed, if we utilize the identities: 

K\ q =x = 1, <D| ?=1 - -1, [(—cD) 1 ^] | ?=1 = e -«, and G| ?=1 = /3, (53) 

then 

Pi \ q= 1 = = = e -f>E ijz 

Similarly as Ea. d26] ) also the relation (l53l ) represents an important consistency check of our procedure. 

13 


( 54 ) 
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4.2.1. Multifractal case 

By following the same strategy as in the case r - q, we plug the multifractal scaling relations for p, ~ s' 1 ' to 
Eq. (l50l) and use the fact that the role of E, is taken over by -a. In e. After a short calculation we arrive at 


E T(q)+ai(\-q) 


o 

1 - — (cij - <a)i)lne 


1 + 


q (a, - (a) q ) In 


e , 


(55) 


which in the small-s limit yields 


f(q) + a/l - q) + In 


Q. 

1 - — (at - (a) i) In s 


/In s ~ In [l + q[ai - (a) 9 )lnej /Ins ~ 0. 


(56) 


The last relation follows from the fact that for q > 1/2 (cf. Appendix A i the expression goes to zero in the small s 
limit. Note that Eq. (l56l > implies a nontrivial behavior only for 


Q/|0| < 1/V-lne. 


(57) 


This then gives that 

r(q) + a/(l -q) ~ ^(fl, -<a)i), (58) 

and hence Q/|<E>| = q- 1. Latter shows in particular that q e [1,1 +1 / V- In s]. Rather than dealing with the single-ce// 
probability p, we can again address the (more relevant) total probability Pj(a) ~ s ai ~f fa, \ By using the fact that (cf. 
Eq. (|53) 


T(q) + [at - /(a,)](l - q) + ^(a,- - <a)i) + /(«;)(1 - q) ~ 0, 


and the expansion 


/(ad-/((ah) - (fl,--<fl>i) + ah)(ai-(ah) 2 


, , x ^ , 1 (a> - (a) i) 2 , 

- (a,-(a) i) + - ,- + 

2 (A a) z In e 


[in the second equality we have used again the curdling theorem (see Appendix C i], we obtain 


P/a) 


1 - 


(1 ~q) (at ~ (a){)“ 




(A a/ 


(59) 


(60) 


(61) 


This prescription naturally appears in the context of multiplicative cascades with the coarse-grained scaling e — 2~ k 
(k ^ 1). Again, the natural application would be in a fully-developed turbulence. The proximity of q to one makes 
the previous distribution suitable for discussions concerning the dynamics on the measure theoretic support, i.e., a 
set whose Hausdorff-Besicovich dimension is a(l) = /(a( 1)). In particular, it can be shown [590 that the measure 
theoretic support describe the set on which the probability is concentrated. 


4.2.2. “High-temperature” expansion 

Similarly as in the r — q case we can find the “high-temperature” expansion by assuming that O <sc 1. In such a 
case we have 


Here 


d> 

( 0 ) + ClAEd 



K(q- 1 ) 

<&q 



Q 

1 + ^ AE > 


W(x) [1- (1- 9 )Q* AEi] . 


q O W(x) 

q - 1 tp(W(r) + 1) ’ 


K(q ~ 1 ) 

(Pq 


exp 



(62) 


(63) 


14 




















P. Jizba and J. Korbel / Physica A 00 (2015) 7 4231 


15 


Through (l52l > this implies that 

Pi = Z ‘[ 1 - (1 - q)D.^Ei\ m - q) , (64) 

with 


z = 2][ 1 - (i-?)n*AE* 


1 /( 1 -?) _ 


q 


K(q- 1) 


W(x) 


l/(?-l) 


(65) 


Relation (l64l > coincides with the Tsallis-type distribution that is historically known as Bashkirov’s 1st version of 
thermostatistics [58]. 

Note in passing that by using the identity lim 9 _>i W(x)/(q — 1) = 1, we obtain that the factor Q* approaches the 
inverse temperature /3 in the limit q —» 1 as it should. 

4.2.3. “Low-temperature ” expansion 

We now wish to consider the “lo' 
divide the situation into four sub-cases: 


'-temperature” 

expansion — 

i.e., 

Q 

» 

a\) 

(q- 

1 ) 

> 

0 

and 

A E 

< 

0 , 

ai) 

(q- 

1 ) 

> 

0 

and 

A E 

> 

0 , 

bi) 

(q - 

1 ) 

< 

0 

and 

A E 

< 

0 , 

b 2 ) 

(q~ 

1 ) 

< 

0 

and 

A E 

> 

0 . 


Unlike case r — q, the sub-cases group into two qualitatively distinct classes: 

1. cases « 2 ) and b\) lead to the asymptotic expansion W(z) oc ln(z) - ln(ln(z)), because — > 0. 

2. cases a\ ) and hi) lead to the situation, when the Lambert W-function is not defined, which corresponds to the 
fact that Eq. (l5ll has no solution. 

So in particular, we see that in cases when our hybrid entropy cannot be consistently used over the whole temperature 
range. It can be at best used as an effective entropy in higher-temperature regimes. This might be particularly pertinent 
in the high-energy particle phenomenology where the host of phase transitions is happening under conditions that are 
far from thermal equilibrium (e.g., chiral phase transition in QCD and ensuing quark-gluon plasma formation). In the 
first case, i.e., when the asymptotic expansion exists, the probability distribution can be written in the form 


Pi 


( 

1(9- 1) 

q 

■(1 +GA Ej) 

w 


t 2 ?) 

] + In [1 + ClAEj] 


!/(?-!) 


( 66 ) 


Contrary to r — q, the resulting distribution has functionally different form from both the Boltzmann distribution and 
Tsallis distribution, even in the generalized form, i.e. with the self-referential temperature. For large temperatures, the 
second term in the denominator is negligible and the distribution becomes similar to power-like behavior. We shall 
again note that it is necessary to check consistency of asymptotic expansions. 


5. Concavity and Schur-concavity of D q 

In this section we discuss the concavity properties of D q . When referring to concavity issue of entropies, one has 
to distinguish between two types of concavity. In thermodynamics, the important issue is to show whether or not the 
thermodynamical entropy is a concave function of extensive variables. This means to show that D q \ m ?oi is a concave 
function under the constraints as in the case of Gibbsian MaxEnt. Note that in contrast to the information-theoretic 
entropy D q , D q | m ax is the system entropy, i.e., it depends on the actual system state variables. 
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In the information theory, the significance of concavity lies in the fact that it automatically ensures the validity of 
the maximality axiom. In case of D q , it suffices to explore the concavity issue only for because Ini,,; is concave 

and non-decreasing function for all q > 0 , 


— (V <ln? M - e -< lnp >* 

d P 21 > 


/ dflnnA 2 _ d 2 {\nP) q 
l dPi ) dpj 


(67) 


5 2 <ln7>>, 


It can be shown that the bracket is always negative for q 6 [j, 1]. Contrary, for q > 1 we have that — 
while the first term remains bounded and therefore the function cannot be concave for all p,-’s. 

However, concavity is only a sufficient condition that ensures the maximality axiom. As known, e.g., from the 
case of Renyi entropy, there are examples of non-concave entropies which still have well defined global maximum at 
P - {1 /«,..., 1 /n\. In fact, there exist weaker concepts that ensure validity of the maximality axiom. Among these 
the most prominently is the notion of Schur-concavity s. The overview of applications of Schur-concavity can be 
found in Refs. [71], 72]. This concept is based on the idea of majorization. We say that a probability distribution 
P = {p\, .. .,/?„} is majorized by distribution Q = {q \,..., q n ] if for ordered probability vectors > p, 2 ) > ..., resp. 
q( i) > q( 2 ) > ... hold J^ J k=1 pqq < Yj[ = \ where j = 1,... ,n — 1 (for j = n is the inequality fulfilled automatically 
from normalization). We denote P < Q. We say that the function F is Schur-concave if for V < Q is F(P) > F(Q). 
The Schur-concavity automatically preserves the maximality axiom, because the uniform distribution is majorized by 
every other distribution. Shi et al. have shown [73] that special subclass of functions called Gini means (defined e.g. 
in Ref. [74[]) that can be expressed in the form 


G(q',x,y ) = exp 


x q In x + y q In y 
xF +y q 


( 68 ) 


is for (x,y) e R j Schur-convex function of (x,y) when 2q > 1 (this is a consequence of 1731 Theorem 1 ] for r = s). 


It is then easy to shown that D q is Schur-concave function for q > \. As a consequence, for q > \, D q fulfills the 
maximality axiom. Moreover, Ref. 173] discussed the case q e (0, \ and concluded that one cannot say anything 
about Schur-convexity or Schur-concavity of D q on this interval. For illustration, in Fig. |T| we compare three types of 
entropies, i.e., D q , S q and I q for various values of q on distribution P - {p, 1 - p), and we observe that © 0.4 is neither 
Schur-convex nor Schur-concave, which is caused by the fact that maximum is not in P = {1/n,..., 1/n}. 


6. Conclusions and outlooks 

We have presented a plausible generalization of the information entropy concept. Our approach is based on an 
axiomatic merger of two currently widely used information measures: Renyi’s and Tsallis-Havrda-Charvat’s. Such 
a merger is natural from the mathematical point of view as both above measures have an axiomatic underpinning 
with a very similar axiomatics. From the physics viewpoint the above merger is interesting because it combines two 
entropies with analogous MaxEnt distributions but with very different scope of applicability in physics. 

We have shown that the maximizers for T) q subject to constant averaged energy are represented in terms of the 
Lambert W-function. The Lambert W-function is a special function that appears in numerous exactly solvable statis¬ 
tical systems. Tonks gas [53], Richards growth model and Lotka-Volterra models l54ll may serve as examples. The 
Lambert W-function was recently also used in quantum statistics [55] and statistics of weak long-range repulsive po¬ 
tentials [53]. This usage nicely bolsters our suggestion that a typical playground for D q could be in statistical systems 
with both self-similarity and non-locality. In addition, as a byproduct, we have obtained during our analysis some new 
mathematical properties of the Lambert W-function. 

Due to complicated analytical structure of the MaxEnt distribution we have resorted in our discussion to the 
“low” and “high temperature” asymptotic regimes. We have shown that under certain parameter conditions these 
have the heavy tailed behavior that is identical with Tsallisian maximizers. The fact that this is true only asymptoti¬ 
cally might be at first sight a bit surprising, as there exists perception that both THC and Renyi’s entropies have the 
same maximizer and hence the merger entropy should again posses the same MaxEnt distribution. This anticipation 
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is clearly erroneous. Indeed, both Renyi entropy and THC maximizers have the Sams functional form but their re- 
spective“temperature” parameters [f are entirely different functions of q, and in the case of THC entropy [¥ is even 
self-referential (i.e., it depends on the distribution itself) [58]. 

In summary, we have shown that there exists a well defined sense in which one can combine Renyi and THC 
entropic paradigms. We have found the associated one-parametric class of entropy measures, namely (J9]» and the 
ensuing MaxEnt distributions < 1 22 b . It can be rightly objected that apart from the axiomatic side more is needed 
to consider D q as a legitimate object of statistical physics. In this connection one should, however, stress that the 
presented entropy has a number of desirable attributes; like THC entropy it is a one-parametric class of entropies 
satisfying the non-extensive ry-additivity, it goes over into 77 (V) in the q —> 1 limit, it complies with thermodynamic 
stability, continuity, symmetry, expansivity, decisivity, Schur concavity, etc. On that basis it appears that both D q and 
THC entropies have an equal right to serve as a generalization of statistical thermodynamics. 
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Appendix A. Derivation of D q from J-A axioms 

In this appendix we show the basic steps in the derivation of functional form of hybrid entropy D q . 

Let us first denote D(\jn, 1 /«,..., \jn) = f.(n). Axioms 2 and 5 then imply that £.(n) = D{l/n, ..., l/«,0) < 
D{l/n + 1,..., \/n + 1) = fin + 1). Consequently £. is a non-decreasing function of n. To determine the explicit 
form of fin) we will assume that A (1) ,..., A"' 1 are independent experiments each with r equally probable outcomes, 
so 


£)(A W ) = £>(l/r,..., 1/r) = £{r ), (1 < k < m). 

Repeated application of axiom 3 then leads to 


(A.l) 


£)(A (1) U A (2) U ... U A (m) ) = -C(r m ) = Yu"' Ck (1 - ^'^(A®) 

k= 1 

= 77 -—r [d + (1 - q)Ur)) m - 1] , (A.2) 

(1 ~q) 

where m Ck is the binomial coefficient. By assuming that (IA.2b can be extended from m e N+ to M, we can take partial 
derivative of both sides of (1A.2I) with respect to m and by setting m - 1 we obtain the differential equation 


(1 - q) d£ dr 

(l+(l-9) £) [In (1 + (1 ~ q) -O] = V\nr 

The general solution of (IA.3b has the form 


(A.3) 


JXr) s £ q (r) = -L- (r c<c/> - l) . (A.4) 

1 - q v ’ 

The integration constant c(q ) will be determined shortly. Right now we just note that because at q — 1 Eq. dA.2b boils 
down to fL(r m ) = m£.(r) we must have c(l) = 0. In addition, the monotonicity of 2](r) ensures that c(q)/( 1 — q) > 0. 
To proceed further let us consider the experiment with outcomes A = (Aj, A 2 ,..., A„) and the distribution V = 
{p\,P 2 , ■ ■ ■, p n }- Assume moreover that pi : (1 < k < ri) are rational numbers, i.e., 


Pk 


y gk = g, gke Z + . 

g ^ 


k= 1 
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Let have, in addition, an experiment B — (B\, /L,..., B g ) with associated distribution Q - {q \. q 2 , ■ ■ ■, q g ). We split 
(B i, Bo, ..., B g ) into n groups containing g\,gi, ■. .,g n outcomes respectively. Consider now a particular situation in 
which whenever event A, in A happens then in B all g k events of £-th group occur with the equal probability 1 /g k and 
all the other events in B have probability zero. Hence 


D(B\A = A k ) = D(l/g k ,..., 1/**) = £ q (g k ), 


and so axiom 3 implies that 


D(B\A) = f~ 


^ Qk(q)f(-C q (gk)) 


\k=\ 


(A.6) 


(A.7) 


On the other hand, in the stated system the entropy D(AUB) can be easily evaluated. Realizing that the joint probability 
distribution corresponding to A U B is 


,r> r , ,P\ Pi P 2 P 2 Pn Pn x ... ., . 

K = vki = Pkqi\k) = { — = {1/g, ...,1/g) 

gl g 1 gl g2 _ gn gn 

gix g 2 x g„x 

we obtain that D(A U B) - Jl q (g). Applying axiom 3 together with ( IA.4| i we get 


(A.8) 


D(A) 


1+(1 -q)/- 1 


Y,Ck{q)fU q (Pk)V + (1 - q)£ q <S\ + A(g)) 

s k 


= Ug)-r' 


J] Qk(q)fU q (Pk)[i + (1 - q)£ q (g)] + £ q (g)) 

s k 


(A.9) 


Define f( a , y )(x) = f{-ux + y) => / 1 (x) - >■ = -af (a ' y) (x) then 

f(aX(x)) (2t Qk(q)f(a,£ q (g))(--C(Pk))) 


£KA) = 


i - (i - q)f (a ] m) 


(A. 10) 


with a = [1 +(1 -q)£ q (g)]. 

To proceed further, let us formally put p k — l/r. Eq. (IA.4b then indicates that it is £ q ( 1 / p k ) and not -Jl q (p k ) 
which represents the elementary information of order q affiliated with p k (cf. with Eq. ([5])). It is thus convenient to 
reformulate (lA.IOb directly in terms of -£ 9 (1 /p k ). This can be done via relation 


-CqiPk) = ~ 


-Cq^/Pk) 


i+d -q)£ q a/Pk) 


If we now write 


: easily obtain from ( I A. I Ob that 


g(x) = f(aX(g)) 


1 + (1 - q)x) 


£>(A) = g 


Y^eMgi-CqCt/pk)) 


k / 

Moreover, if we set in the second part of axiom 3 , B = A then £)(A) is given as 


£>(A) = /-' 


^ek(q)f(-Cq(i/p k )) 


V k 


(A.ll) 


(A.12) 


(A.13) 


(A. 14) 
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Using the fact that two quasi-linear means with the same V are identical iff their respective Kolmogorov-Nagumo 
functions are linearly related 0], we may write 

8(x) = / ( i+7i!. 3 g ) x ) - + #,(y) • (A. 15) 

Here y = -C q (g). In order to solve (1A. 1 5b we define <p(x) = fix) - /(0). With this notation Eq. <1A. 1 5b turns into 

^ ( i + (i -7 X ) = + yCy) > < / , (0) = o. (A.i6) 

By setting x — y we obtain that 8 q (y) = — 1, and hence 


<f(x + y + (1 - q)xy) = ip(x) + ip(y ) . 


(A. 17) 


According to axiom 1 we may now extend ( 1A. 17b to real valued x and v. Eq. dA.17b is Pixeder’s functional equation 
which can be solved by the standard method of iterations li45ll . In [69] we have shown that dA. 17b has only one 
non-trivial class of solutions, namely 


if(x) — — In [1 + (1 - q)x\ . 


(A.18) 


■ is here a free parameter. By inserting this solution back to dA. 14b we obtain 


n q (A) = 


<? 


-c(q)Y jk Sk(q)\'nPk 


1 ) = ^ [\(PkY cXq)Bkiq) -1 

’ V k 


(A. 19) 


Note that the constant a got canceled. We have also denoted the explicit order of the entropy D with the subscript q. 
It remains to determine c(q). Utilizing the conditional entropy constructed from dA. 19b and using axiom 3, we obtain 
c(q ) = 1 - q. In result we can recast dA. 19b into more expedient form. By utilizing 


the following results holds 


(Inn, = (1-9) 


d IqdP) 
d q 


- IqdP) ■ 


D q (A) 


1 

i -q 


e -(l-q) 2 dl q /dq 


n 


\ 


Yj( pk)q - 1 

k= 1 


(A.20) 


(A.21) 


Restrictions on qfrom the maximality axiom 

In the foregoing proof we have used the axiom 2 to show that £.(n + 1) > -C(n), which in turn yielded -£(n) = 
lnj f/ |(n), cf. Eq. dA.4b . We have not, however, checked whether the global maximum is really at P = {1/n,..., 1 //;). In 
situation when the entropy is a (Schur-)concave function on the probability space, we obtain the maximality directly. 
This is the case, e.g., for both Renyi and THC entropy. Unfortunately, a (Schur-)concavity of D q is ensured only for 
certain values of q (as discussed in Section [5Ji. Here we illustrate the fact that D q can have maxima in other points 
than V — {1/n,..., 1 /n). To this end we note from <[9ji that because In^fjc) is a monotonous function for x > 0 and 
since e~ x is a positive monotonous function on R, we can consider only (ln!P) 9 . For simplicity’s sake, we present the 
analysis only for probability distribution of two events, i.e., V = {p, 1 — p). The analysis for more outcomes is similar, 
the only difference is that one has to employ the Lagrange multipliers to account for the fact that the probability vector 
is confined on a simplex. 

Stationary points of (In V) q are solutions of the equation 


d tp q Xnp + (1 - p) q ln(l - p) 

d^i p q +{\ -p) q 


i ly?- 1 - (i - P f q -> + P q -\ i - P f - P \ i - p t x 
Z(q)~ L 


- qP \ i - pr' ln (7r) + qpq ~ X{l ~ p)q ln ( w) 
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= 0. 


(A.22) 
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The factor Z{q) 2 = [p q + (1 - pY 1 ] 2 is positive and can be thus omitted from the further analysis. After the substitution 
y = p/( 1 — p) the previous equation reduces to 

y 2q ~ 1 -y q (\ -q\x\y)+y q ~\] + q\ny) - 1 =0, (A.23) 


or alternatively to 


%(>') = qlny- 


1 _ y q -1 + y q - y 2 9 -i 

yq + yq -1 


= 0 . 


(A.24) 


The interesting property of is that 'If, ( -) = -'IT/y) and 'f'i(y) = In v. 

The equation T'q(y) = 0 has for q only one solution, which is y = 1, or equivalently x = However, for q < 
there occur two more solutions, related by the reciprocity relation. As a consequence, from the nature of (In P) q one 
can deduce, that the point x - \ corresponds to the local minimum, while other two points represent global maxima. 
Eventually, the second axiom is violated for q < \ and D q is therefore well defined only for q> 


Appendix B. Basic properties of D q entropy 

In this appendix, we list some basic properties of the hybrid entropy D q . 

Let us start with features that D q shares with both Renyi’s and THC entropies. These are 

(a) D q CP = {1,0,... ,0}) = 0 

(b) D q (P) > 0 

(c) T)\ = I\ = S\ = ( H 

(d) D q involves a single free parameter - q 

(e) D q is symmetric, i.e., D q (p i,... ,p„) = T) q {p km ,.. .,p k (n)) 

(f) D q is bounded 

On the other hand, among features inherited from Renyi’s entropy we can find that 

(g) D g (A) = r l (£ k 9 k (q)f(Dq(A k ))) 

(h) For single-dimensional statistical systems with continuous PDF D q (A) reduces to 'H 

(i) D q is a strictly decreasing function of q, i.e., dD q /dq < 0, for any q > 0 

Result (i) follows from the fact that D q is a monotonically decreasing function of A q = YYk Bkiq) In Pk (see Fq. dA. 1 91 ) ) 
and that A q is a monotonically increasing function of q, indeed 

dA 

= {(UV)) 2 ) q -{\n(V)) 2 q > 0. (B.l) 

Here (.. ,) q is defined with respect to the distribution g k (q). The last relation in (IB.Il l is Jensen’s inequality. Note that 
dD q ldq = 0 happens only for the degenerate case V — {1,..., 0} (and ensuing permutations), 
here Finally, properties taken over from THC entropy include 

(j) maxpD q (P) = T) q (P = {l/n,..., l/n}) = In {q] n (for q > 1/2) 

(k) !D q is q non-extensive, i.e., D(A U B) - D(A) + D(B\A) + (1 - q)D(A)D(B\A) 
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Appendix C. Some essentials of the multifractal formalism 


We present here some essentials of the fractal and multiftactal calculus that are employed in the main body of the 
text. 

Fractals are sets with a generally non-integer dimension exhibiting property of self-similarity. The key charac¬ 
teristic of fractals is a fractal dimension which is defined as follows: Consider a set M embedded in a ^-dimensional 
space. Let us cover the set with a mesh of ^-dimensional cubes of size e' 1 and let N e (M) is a number of the cubes 
needed for the covering. The fractal dimension of M is then defined as 148. 591 


„ In N e (M) 

D — - lim- 

£->o In s 


(C.l) 


The dimension defined in (1C. Il l is also known as box-counting dimension. In most cases of interest the latter coincides 
with the Hausdorff-Besicovich dimension used by Mandelbrot [48]. 

Multifractals, on the other hand, are related to the study of a distribution of physical or other quantities on a generic 
support (be it or not fractal) and thus provide a move from the geometry of sets as such to geometric properties of 
distributions. Let us suppose that over some support (usually a subset of a metric space) is distributed a probability of 
a certain phenomenon. If we pave the support with a grid of spacing e and denote the integrated probability in the zth 
box as pj, then the scaling exponent a,- is defined [48,[59] 


Pile) ~ e a ‘. (C.2) 

The exponent a, is called singularity or Lipshitz-Holder exponent. Counting boxes N(a) where p, has a, e (a, a + da), 
the singularity spectrum f(a) is defined as [48, 5^] 


N(a) ~ s- f(a) . 


(C.3) 


Thus a multifractal is the ensemble of intertwined (uni)fractals each with its own fractal dimension /(a,-). It is further 
convenient to define a “partition function” |48] 

Z(q) = ^(pMf = Jr/a'p(«V /( “V a ', (C.4) 

(p{a) is a proportionality function having its origin in relations (1C.21 ) and (1C. 3b ). In the small e limit the method of 
steepest descent yields the scaling 

Z(q) ~ £ T(?) , (C.5) 

with 

r(q) - min^fl - f(a)], f {a ) = q and r'(q) = a(q) . (C.6) 

a 

These are precisely Legendre transform relations. Scaling exponent r is often called the correlation exponent. Legen¬ 
dre transform (1C.6b ensures that pairs f{a), a and r(q), q, are conjugates comprising the same mathematical content. 

It is an important consequence of (1C.4b that the relative fluctuations of the Lipshitz-Holder exponent a around its 
mean value ( a) q are very small in the s —> 0 limit. This is because 


3 2 (ln Z(q))dq 2 = [(a\ - (a)^(lns) 2 = (Aa) 2 (\ns) 2 , {C.l) 

d 2 {T\ns)dq 2 = ( da/dq)h\E = [l//"(a)]lns. (C.8) 


Since both left-hand sides in (1C.7b and (1C.8b are identical, we can infer from a finiteness of /"(a) that the standard 
deviation of a is of order 1 / V- In s. So for small e the a-fluctuations become negligible and almost all a equal to (q) q . 
Note also that because the variance (Aa) 2 > 0 and In e < 0, we have that f"{a) < 0, i.e., the f{a) function is concave. 

The fact that for a given q the total probability of a phenomenon with a scaling exponent a, is concentrated around 
the value a, ~ {a) q is known as the curdling theorem B48I1 (or Billingsley theorem 14911 1 and it represents a particular 
example of the so-called measure concentration phenomenon [6(j]. 
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Multifractal formalism has direct_applications in the turbulent flow of fluids m, percolation |62|. diffusion- 
limited aggregation (DLA) systems [63], DNA sequences lf64ll . finance 165. 671. string theory [66], etc.. In chaotic 
dynamical systems all I q are necessary to describe uniquely, e.g., strange attractors |46]. More generally, one may 
argue [35] that when the outcome space is discrete then all I q (or S q ) with q e [ 1, oo) but are needed to reconstruct the 
underlying distribution, while when the outcome space is cZ-dimensional subset of W l then all I q (or S q ), q e (0, oo), 
are required to pinpoint uniquely the underlying PDF. The latter can be viewed as the information-theoretic variants 
of Hausforff’s moment problem of mathematical statistics. 

The connection of Renyi entropies with multifractals is established via relation (1C.41) . Note particularly that when 
s is finite then I,, plays the role of the Helmholtz free energy. Closer analysis of the related implications can be found, 

e.g, in Refs. 0B 
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